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Abstract. 

In a blind search for continuous gravitational wave signals scanning a wide 
frequency band one looks for candidate events with significantly large values of 
the detection statistic. Unfortunately, a noise line in the data may also produce 
a moderately large detection statistic. 

In this paper, we describe how we can distinguish between noise line events 
and actual continuous wave (CW) signals, based on the shape of the detection 
statistic as a function of the signal's frequency. We will analyze the case of a 
particular detection statistic, the F statistic, proposed by Jaranowski, Krolak, 
and Schutz. 

We will show that for a broad-band 10 hour search, with a false dismissal rate 
smaller than 10~®, our method rejects about 70% of the large candidate events 
found in a typical data set from the second science run of the Hanford LIGO 
interferometer. 
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1. Introduction 

High power in a narrow frequency band (spectral lines) are common features of 
an interferometric gravitational wave (GW) detector's output. Although continuous 
gravitational waves could show up as lines in the frequency domain, given the current 
sensitivity of GW detectors it is most likely that large spectral features are noise of 
terrestrial origin or statistical fluctuations. 

Monochromatic signals of extraterrestrial origin are subject to a Doppler 
modulation due to the detector's relative motion with respect to the extraterrestrial 
GW source, while those of terrestrial origin are not. Matched filtering techniques to 
search for a monochromatic signal from a given direction in the sky demodulate the 
data based on the expected frequency modulation from a source in that particular 
direction. In general this demodulation procedure decreases the significance of a noise 
line and enhances that of a real signal. However, if the noise artifact is large enough, 
even after the demodulation it might still present itself as a statistically significant 
outlier, thus a candidate event. Our idea to discriminate between an extraterrestrial 
signal and a noise line is based on the different effect that the demodulation procedure 
has on a real signal and on a spurious one. 

If the data actually contains a signal, the detection statistic presents a very 
particular pattern around the signal frequency which, in general, a random noise 
artifact does not. We propose here a chi-square test based on the shape of the 
detection statistic as a function of the signal frequency and demonstrate its safety 
and its efficiency. We use the F detection statistic described in 1 and adopt the 
same notation as jj. For applications of the F statistic search on real data, see for 
example [210 i). 

2. Method 

2.1. Summary of the method 

We consider in this paper a continuous GW signal such as we would expect from an 
isolated non-axisymmetric rotating neutron star. Following the notation of pj, the 
parameters that describe such signal are its emission frequency fg , the position in the 
sky of the source Is = {as, Ss), the amplitude of the signal Hq, the inclination angle l, 
the polarization angle ip and the initial phase of the signal 2$o- 

In the absence of a signal 2F follows a distribution with four degrees of freedom 
(which will be denoted by X4)- In the presence of a signal 2F follows a non-central x| 
distribution. 

Given a set of template parameters {l,f), the detection statistic F is the 
likelihood function maximized with respect to the parameters Ps = {ho, i^,ip,^o). 
F is constructed by combining appropriately the complex amplitudes Fa and Fh 
representing the complex matched filters for the two GW polarizations. And given the 
template parameters and the values of Fa and Fi, it is possible to derive the maximum 
likelihood values of {ho, i,ip,^o) - let us refer to these as pmle- It is thus possible 
for every value of the detection statistic to estimate the parameters of the signal that 
have most likely generated it. So, if we detect a large outlier in F we can estimate the 
associated signal parameters: {I, f,PMLE)- Let us indicate with s{t) the corresponding 
signal estimate. 
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Let x{t) be the original data set, and define a second data set 

i{t) = x{t) ~ s(t) (1) 

If the outlier were actually due to a signal s(t) and if s{t) were a good approximation 
to s{t), then 2F constructed from x{t) would be xl distributed. 

Since filters for different values of / are not orthogonal, in the presence of a signal 
the detection statistic F presents some structure also for values of search frequency 
that are not the actual signal frequency. For these other frequencies 2F is also x| 
distributed if s(t) is a good approximation to s(t). 

We thus construct the veto statistic V by summing the values of 2F over more 
frequencies. In particular we sum over all the neighbouring frequency bins that, 
within a certain frequency interval, are above a fixed significance threshold. We regard 
each such collection of frequencies as a single "candidate event" and assign to it the 
frequency of the bin that has the highest value of the detection statistic. The veto 
statistic is then: 

V:= ^ 2F{h). (2) 

fcGevent 

In reality, since our templates lie on a discrete grid, the parameters of a putative 
signal will not exactly match any templates' parameters and the signal estimate s{t) 
will not be exactly correct. As a consequence x{t) will still contain a residual signal 
and F will not exactly be xl distributed. The larger the signal, the larger the residual 
signal and the larger the expected value of V. Therefore, our veto threshold Vthr 
will not be fixed but will depend on the value of F. We will find such F-dependent 
threshold for V based on Monte Carlo simulations. The signal-to-noise ratio (SNR) for 
any given value of the detection statistic can be expressed in terms of the detection 
statistic as \/2F, as per Eq. (79) of p. Therefore we will talk equivalently of an 
SNR-dependent or i^-dependent veto threshold. 



2.2. Stationary Gaussian noise plus a signal with exactly known parameters 

Let us first examine the ideal case where the detector output consists of stationary 
random Gaussian noise plus a systematic time series (a noise line or a pulsar signal) 
that produces a candidate in the detection statistic F{f) for some template sky 
position I and at frequency /. The question that we want to answer is: is the shape 
of F{f) around the frequency of the candidate consistent with what we would expect 
from a signal ? 

Our basic observables are the four real inner products Xi(f,l) between the 
observed time series x{t) and the four filters hi{t; I, /): 

X,{f,l)^{x{t)\\h,{t-Xf)), (3) 

where i runs from 1 to 4. The inner product is defined by Eq.(42) of T. The four filters 
hi(t; I, /) depend on the target frequency / and the target sky location I = (a, 6). 
The hypothesis Hq that we would like to examine is 

Ho:x{t) ^n{t)+s{t;pMLEjJ), (4) 

where n{t) is the detector noise and s{t;pMLE,l, f) = Ai{pMLE)hi{t;l, f) is the 
template, which in this case perfectly matches the signal. The parameters pmle 
are the maximum likelihood estimators of /iq, cos t, ■0, $o derived from the data and 
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the template parameters I and /. The definitions of the four coefficients Ai are given 
inH). 

Given that the template parameters I, f exactly match the parameters of the 
actual signal, then the waveform s(t;pMLEj, f) exactly matches the actual signal 
s(t). In this case the four variables Xi{f, I): 

Mf,i) = Mf,i) - m II h,{t-Xf)) (5) 

are four correlated random Gaussian variables. The paper I constructs the detection 
statistic F from the data Xi{f). Similarly, we construct ^v{f) ■— F{f;X{f)) from 
the data Xi{f). Tv{f) is also centrally xl distributed in the presence of a signal and 
perfect signal-template match. We obtain the veto statistic by summing 2Ty{f) over 
the different frequencies of the event 

V:= "^Mfk), (6) 

k—ki .■■■,/i:jv€: event 

where N is the number of the frequency bins in the event. If the value of V is not 
consistent with a xIn-a distribution, we reject the hypothesis Ho- Note that the 
degrees of freedom of the veto statistic is 4N — 4, as we use four data points to infer 
the four parameters Ps ■ 

2. 3. Real noise plus a signal with parameters mismatched with respect to the template 

In the real analysis the signal parameters Is will not exactly match the values of one 
of our templates As a consequence, pmle will not match exactly the actual ps 
parameters and the frequency where the maximum of the detection statistic occurs, 
fmaxi will not be the actual frequency of the signal fs- However we can still set up 
a procedure to answer the question: is the shape of the F statistic event consistent 
with what we would expect from a signal with parameters close to I ? 

Suppose that an event has been identified for a position template / and for a value 
of the signal frequency fmax- This is how the veto analysis would proceed: 

1. we determine pmle and Xi[fk, I) for each fk of the event. 

2. we generate a veto signal s{fk;PMLE,l, fmax) and compute the four variables 

SiifkJ) = isit;PMLE,l, fmax) \\ hi{t; I , fk)) ■ 

3. we construct the variables: 
X,{fkJ)^X,{fk,l)-S,{fk,l). 

4. using Eq. ((HJ we compute Tv{f) and then V . 

If s{fk]PMLE-,l, fmax) IS & good approximation to s{fk\Ps-,h, fs)-, then V follows 
the X47V-4 distribution. 

2.4. SNR- dependent veto threshold 

As already outlined at the end of section [TTl the veto statistic does not in general 
follow a x\n-a distribution because in general the signal parameters do not exactly 
match the template parameters. Due to this mismatch when step 3 is performed in 
the procedure described in the previous section, not all the signal is removed from 
Xj. Consequently V acquires a non-zero centrality parameter. Since this scales as /ig 
in the presence of a signal, the veto statistic threshold has to change with the SNR 
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of the candidate event in order to keep the false dismissal rate constant for a range 
of different signal strengths. We will thus adopt a SNR-dependent veto threshold on 
our veto statistic V. We will determine the threshold Vthr{SNR) via Monte-Carlo 
simulations. 

An SNR-dependent threshold in a similar context was first used by the TAMA 
group in] who performed SNR-V/do/ studies to veto out candidate events in their 
inspiral waves searches. See also (HI for a detailed description of a time-frequency 
test. In a context of a resonant bar detector burst search, see [J]. 

3. Application 

To determine the false dismissal rate, the false alarm rate and the threshold equation 
for the veto statistic, we have performed a set of Monte Carlo simulations on artificial 
and real noise. We have used 10 hours of fake Gaussian stationary noise and of real 
science data from the LIGO Hanford 4km interferometer. The results presented here 
are thus valid for a 10 hour observation time, which is the observation time of the 
all-sky, wide-band search that we plan to conduct on data from the second science run 
of the LIGO detectors. We do not take into account spin down of pulsars. This may 
be justified for the short time length of the data. 

As it will be explained below we have injected both signals and spurious noise 
artifacts of the type that we observe in the detector output. The parameters of the 
gravitational waves signals which are injected into the noise are uniformly chosen 
at random in the following ranges: fs G [100,500] Hz, e [0, 27r], sinJ^ G [^1,1], 
cost £ [—1, 1], ip € [— 7r/2, 7r/2], $o G [0i27r]. The strain h^ or the amplitude of the 
model noise line is also randomly chosen in such a way that the resulting detection 
statistics value lie in the range: V50 < \/2Fjnax < 70. Below 2F — 50 the efficiency 
of the test quickly degrades. We will thus not apply this veto technique to candidate 
events with 2F < 50. In this sense, our method is designed to only discard large 
outliers. 

3.1. Safety test 

3.1.1. Signals in random Gaussian stationary noise We have performed 2 x 10^ 
Monte Carlo simulations. The following steps were executed iteratively 200 times: 

• We randomly choose a signal frequency fs of a simulated gravitational wave and 
then follow the steps below 100 times: 
— we randomly choose a signal sky position Ig = [otg, 5g) and perform the steps 
below 100 times: 

1. we randomly choose a set of Ps signal parameters and generate the 10 
hour long data set described above consisting of random Gaussian noise 
and the fake signal . 

2. we randomly displace the sky position template from the signal values 
by adding a random number uniformly distributed between ± half the 
sky positions grid spacing: |a — as] < 0.01 and |(5 — < 0.01 (both in 
radians). The grid spacing was estimated numerically and ensures that 
the loss in F due to the signal-template mismatch is less than 5 % for 
99% of the simulations, for a 10 hour observation. 
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3. we search for the signal with template values I in a small frequency range 
around fg. This results in the identification of an event, defined by a 
value of F (the highest value of all the F values in the event, denoted by 
Fraax) occurriug at a frequency fmax- Based on this we determine the 
maximum likelihood estimators pmle- Also, we compute Xi{fk,l) for 
all the frequencies of the event. 

4. we generate a veto signal s{t;pMLE, I, fmax)- 

5. we compute Si{fk, I) for all the frequencies of the event. 

6. we construct V from Xi{fk,l) = Xi{fk,l) - Si{fk,l). 

By considering only values of \/50 < ^/2F < 70 we obtain 1426915 sets of F and 
V values. Fig. ^ is the scatter plot of these. It is convenient to normalize each V 
by the corresponding number of degrees of freedom, dof, since dof could differ from 
one injection to another since the number of frequency bins in a candidate varies from 
event to event. If V follows the Xdof distribution, then the mean of V is just dof. 
Fig. 121 shows the estimated probability distributions of V = yjvjdof for four selected 
ranges of F . These four graphs show that the probability distributions are well-defined 
and the Monte Carlo simulations give a good estimate of the probability distribution 
of V. Since a variable mismatch exists between the signals and the templates, the 
distribution of V is actually not strictly a central Xdo/ ^^e expected value of V is 
thus not strictly 1. And, as expected, the peak of the distributions of Fig. |2| deviates 
from 1 more as the signal becomes larger. 

From Fig. ^one can now define the threshold on the veto statistic, based on the 
false dismissal rate that one is willing to accept. The solid line in the figure, with 
equation 



j2F„,ax = lOW — - 10 (7 X 10"^ false dismissal), (7) 
V dof 

is the line with the lowest false alarm rate for which, with our sample size, we have not 
falsely dismissed any of the injected signals. In the rest of this paper, we will adopt 
this line, Eq. (TJ, as the nominal threshold line. 

3.1.2. Signals in real data - LHO 10 hours We have performed 2 x 10^ Monte Carlo 
simulations by injecting a simulated pulsar signal into real data. All the steps are 
similar to those described in 13. 1.11 We have avoided injections in bands contaminated 
by spectral disturbances. 

From this set of simulations, we obtain a similar scatter plot as FigQ] And indeed, 
the threshold line Eq. (fT)) still does not dismiss any injected signals. 

3.2. Efficiency test 

To study how efficient the test is in vetoing noise artifacts that resemble the signals 
that we are trying to detect, we have performed an additional set of simulations. For 
each simulation, we have injected sets of time-domain exponentially-damped sinusoids 
(as a model of a line noise) into both fake Gaussian noise and in real data. In the 
frequency domain these damped sinusoids have a Cauchy distribution, components 
of which are often observed in the real data. We hence follow similar steps as for 
the safety tests described above and produce the corresponding scatter plots of SNR 
versus V. 
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Figure 1. A scatter plot of the veto statistic and the signal to noise ratio (SNR) 
for sets of 10 hours simulated data. Each data set consists of a Gaussian noise 
plus a software-simulated signal. Each dot in this plot represents the candidate 
event detected by our search code. The veto statistic in this plot is V = y^V/dof, 

and SNR = \JlF . The straight line represents Eq. 0. The detector is assumed 
to be LHO detector. The number of the data points with \/50 < SNR < 70 is 
1426915. 
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Figure 2. The estimated probability distributions for V with SNR in the four 
selected ranges. This figure is for the 10 hours simulated data. The detector is 
assumed to be LHO detector. 
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The efficiency test here is ih-defined in the sense that it is possible to generate 
infinite numbers of fine noises that have completely different shapes from pulsar 
signals. Nonetheless, we think these tests provide "a feel" for the efficiency of our 
veto method. 

3.2.1. Noise lines in random Gaussian stationary noise The following steps are 
iteratively performed 200 times: 

• we randomly choose the frequency of a noise line and follow the steps below 100 
times: 

— we randomly choose a target sky position, with uniform distribution in 
a e [0, 27r], sin(S S [—1,1] and then perform the steps below 50 times: 

1. we randomly choose the noise line parameters. The e-fold decay rate 
varies between O.OI/Tq and 2/To, where Tq is the total observation time. 

2. we generate a 10 hour long data set consisting of random Gaussian noise 
with standard deviation 1 and the noise line defined by the parameters 
above. 

3. we perform a search in a frequency band around the frequency of the 
noise line and identify an event, i.e. a value of F and fmax- From the 
values of the complex component of the detection statistic at fmax we 
determine pmle and Xi{fk,l) for every frequency of the event. 

4. we generate a veto signal s{t;pMLE, U fmax) 

5. we compute Si{fk,l) for the veto signal for all the frequencies of the 
event. 

6. we obtain V. 

Fig. shows the SNR-V plot. It may seem that the data points are densely 
distributed in the left upper region with large SNRs and small V. This deceptive 
appearance is due to the coarse graphical resolution of the figure. This can be clearly 
seen in the estimated probability distributions, shown in Fig. 0] In fact, if we take 
our nominal threshold line, Eq. (TJ, shown as the solid line in Fig. O the false alarm 
rate is estimated to be 8.4 %. 

3.2.2. Real data: LHO 10 hours We have performed 10^ Monte-Carlo simulations 
injecting noise lines as described above into real data, avoiding frequency bands with 
large noise artifacts. 

The resulting scatter plot is similar to that obtained for the Gaussian random 
noise case. Indeed, we obtain 5.1% false alarm rate for the nominal threshold Eq. Q. 

3.3. Application to real data 

Having observed safety and efficiency of our veto method, we now show an application 
of the method to real data (no signal nor noise lines injected). 
We take the following steps iteratively 1200 times: 

• we randomly choose a template sky direction I over the whole sky 

1. we perform a wide-band search over the interval [100,500] Hz in the 10 hour 
real data set. We identify events in the detection statistic and to each of 
these events we apply our veto test. This procedure yields a value of F and 
V for each candidate event. 




Figure 3. A scatter plot of the veto statistic and the signal to noise ratio (SNR) 
for sets of 10 hours simulated data. Each data set consists of Gaussian noise plus a 
software-simulated noise line. Each dot in this plot represents the candidate event 
detected by our search code. The veto statistic in this plot is V = ^Jv/dof, and 

SNR = \JlF . The straight line represents Eq. |3. The detector is assumed to be 
LHO detector. The number of the data points with \/50 < SNR < 70 is 954063. 




Figure 4. The estimated probability distributions for V with SNR in the four 
selected ranges corresponding to Fig. |3] 
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veto statistic vs SNR: # of data =68388 
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Figure 5. A scatter plot of the veto statistic and the signal to noise ratio (SNR) 
for a 10 hours real data of LHO detector. Each dot in this plot represents the 
candidate event detected by our search code. The veto statistic in this plot is 
V = ^JVJdoj, and SNR = \/2F . The straight line represents Eq. Q. The 
number of the data points is 68388. The maximum \/2Fmax is 66.6. 

The scatter plot SNR-V is shown in Fig. [S] Two distinct branches along the solid 
line at higher SNRs are evident. Both branches are due to spectral features in the 
data: the highest SNR branch to a line at 465.7 Hz, the lower branch to a line at 
128.0 Hz. These spectral features "trigger-off" a whole set of templates giving rise to 
the observed structure in the scatter plot. If we adopt the threshold line Eq. 0, 70 
% of the events are rejected. 

4. Discussions 

We have defined a veto statistic to reject or accept candidate CW events based 
on a consistency shape test of the measured detection statistic. We have shown 
how to derive the SNR-dependent threshold for the veto test, through Monte Carlo 
simulations on a playground data set similar to the one that one intends to analyze. 

The veto method demonstrated in this paper does not require any a-priori 
information on the source of noise lines. However, we expect that the effectiveness 
of this veto technique can greatly benefit from data characterization studies aimed 
at identifying spectral contamination of instrumental origin. We are now further 
investigating methods to veto out family of outliers identified in the scatter plots above 
the solid line in Fig. [5| Natural candidates are those noise lines whose properties are 
known experimentally, for example the 16 Hz harmonics in the LHO data due to the 
data acquisition system. It is precisely these harmonics that give rise to one of the 
major branches above the solid line in Fig. as shown in Fig. 

In this paper, we have used a 10 hour long data set. For a longer observational 
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veto statistic vs SNR: # of data =60375 
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Figure 6. The same plot as|5] but after removing the 16 Hz harmonics. The 
number of the data points is 60375. The maximum \/2Fmax is 66.6. 

time, the difference between an extraterrestrial line and a terrestrial one becomes larger 
because the Doppler modulation patterns of a putative signal carry a more specific 
signature, that of the motion of the Earth around the Sun. We have not included 
spin down of pulsars in our current study, as we have used short enough time length 
data. Spin down effects of pulsars become more important for a longer observation 
time, and spin down effects generate characteristic feature in the F statistic shape. 
We thus expect that our veto method will become more efficient and safer for longer 
observation times. 

Finally, we note that a veto threshold line varies depending on observational data 
time length and noise behavior. The threshold line Eq. {Tj) is specifically for 10 hours 
LHO data, of particular band, and we recommend that any other search that uses 
quite different data set from our play ground data should determine a threshold line 
based on a play ground data in each analysis. 
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